Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connectors] Google drive parent scripts improvement #9390

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

tdraier
Copy link
Contributor

@tdraier tdraier commented Dec 16, 2024

Description

  • Check for parentId consistency, if parentId is not found, log an error and do not update
  • Do not check documents, only tables have invalid parents array
  • Add retry on upsertFolderNode
  • Add revert script

Risk

Deploy Plan

@tdraier tdraier changed the title [connectors] Google drive parent [connectors] Google drive parent scripts improvement Dec 16, 2024
Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, apart from one comment where I'm not sure I understand?
The rest are nits

},
order: [["id", "ASC"]],
limit: QUERY_BATCH_SIZE,
}
);

for (const file of googleDriveFiles) {
loop: for (const file of googleDriveFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that a label?? wow :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not use labels for years :-D

},
order: [["id", "ASC"]],
limit: QUERY_BATCH_SIZE,
}
);

for (const file of googleDriveFiles) {
loop: for (const file of googleDriveFiles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename googleDriveFiles as googleDriveFolders since in that case we select for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed again and now gettng folders and csv , so wil keep "files" :-)

const internalId = file.dustFileId;
const driveFileId = file.driveFileId;
const parents = [driveFileId];
let current: string | null = file.parentId;
while (current) {
parents.push(current);
if (typeof parentsMap[current] === "undefined") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't we guaranteed to err at some point? since at some point we get to the root drive, which is not in googleDriveFiles right?
YPATH so I'm likely missing something there 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root dirve is indeed in googleDriveFiles, so in most of the case we don't err :-)
But I found many places where we have parentId which do not match any google_drive_file (in that case I can't guess the parents list)

@@ -146,7 +122,7 @@ async function migrate({
limit: QUERY_BATCH_SIZE,
});

for (const sheet of googleDriveSheets) {
loop: for (const sheet of googleDriveSheets) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Honestly a bit wary about the loop, always a bit hard to read
i'd have a small function computeParents instead of the while, and have it return undefined in error cases continuing without a label (you could also reuse it on docs)
but it's a nit so feel free to ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I was not very proud of the label things .. extracting a function

Copy link
Contributor

@philipperolet philipperolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left a last readibility com, feel free to merge once adressed

while (current) {
parents.push(current);
current = parentsMap[current] || null;
const parents = getParents(file.parentId, driveFileId, parentsMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard to understand here; getParents take a fileId and a nodeId, the fileId should be the first parent, the nodeId should be the child, and we get the parents for the child but without the child as parents[0] which is added later, is that it? at a minimum, args should be renamed for this to be clear

would it be possible to make a single function getParents(nodeId, parentsMap) that would return the exact parents array we want, and minimize the unshifts we do later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree the params are unclear, but we can't just use one single nodeId param, because if nodeId is a spreadsheet id it won't be in the parentsMap (which contains only files). The nodeId param is just here for logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced nodeId with logger param

@tdraier tdraier merged commit 36c675d into main Dec 16, 2024
3 checks passed
@tdraier tdraier deleted the thomas/check-gdrive-parents-2 branch December 16, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants